Epidemic variability in complex networks 
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We study numerically the variability of the outbreak of diseases on complex networks. We use a SI model 
to simulate the disease spreading at short times in homogeneous and in scale-free networks. In both cases, we 
study the effect of initial conditions on the epidemic dynamics and its variability. The results display a time 
regime during which the prevalence exhibits a large sensitivity to noise. We also investigate the dependence of 
the infection time of a node on its degree and its distance to the seed. In particular, we show that the infection 
time of hubs have non-negligible fluctuations which limit their reliability as early-detection stations. Finally, 
we discuss the effect of the multiplicity of paths between two nodes on the infection time. In particular, we 
demonstrate that the existence of even long paths reduces the average infection time. These different results 
could be of use for the design of time-dependent containment strategies. 

PACS numbers: 89.75.Hc, 87.23.Ge, 87.19.Xx 



I. INTRODUCTION 



Many complex systems display a very heterogeneous de- 
gree distribution characterized by a power law de- 
cay of the form P{k) ^ . This form implies the absence 
of a characteristic scale hence the name of "scale-free net- 
work" (SFN) jsll^. Among these networks, a certain number 
are of a great interest to epidemiology 0, 01 ^nd it is thus 
very important to understand the effect of their topology on 
the spreading dynamics of a disease. One of the most rel- 
evant results is that disease spreading does not show an en- 
demic threshold in SFN when the population size is infinite 
and 7 < 3|l[l3,[lll[ll[ll. This result means that a 
disease propagates very easily on a large SFN whatever the 
value of its transmission probability. In addition, recent stud- 
ies showed that the presence of hubs in SFN not only facili- 
tates the spread of a disease but also accelerates dramatically 
its outbreak jll[ll[ll. 

The long-tailed degree distribution of SFN is the signa- 
ture of the presence of a non-negligible number of highly 
connected nodes. These hubs were already identified in the 
epidemiological literature as superspreaders El El. Con- 
sequently, from a public health point of view, studying the 
spreading of epidemics on SFN is all the more appropri- 
ate. Superspreading events affect the basic reproductive num- 
ber i?o — a widely used epidemiological paramete r fl^Elll — 
making its estimate from real-world data difficult 
As a matter of fact, it seems that superspreading events ap- 
peared in the onset of the recent S ARS outbreak I23l 124 IZSL 
l2al and could be crucial for the new emergent diseases and 
bioterrorist threats. Their potential threat justifies detailed 
studies of the incidence of the degree distribution at the ini- 
tial stage of epidemics. 



The variability plays an important role in the accuracy and 
the forecasting capabilities of numerical models and has thus 
to be quantified in order to assess the meaningfulness of simu- 
lations with respect to real outbreaks I20I1 . Using a numerical 
approach, we analyze the evolution of epidemics generated by 
different sets of initial parameters, both for SFN and homo- 
geneous random networks (RN). We use the Barabasi-Albert 
model (BA) jljjfor generating a SFN and the Erdos-Renyi 
network (ER) 12711 as a prototype for RN. Concerning the epi- 
demic modeling, a simple and classical approach is to con- 
sider that individuals are only in two distinct states, infected 
(I) or susceptible (S). There is initially a number of zq N in- 
fected individuals and any infected node can pass the disease 
to his neighbors 119., .2 1.1 . The probability per unit time to 
transmit the disease — the spreading rate — is denoted by A and 
once a susceptible node is infected it remains in this state. In 
more elaborated models, an infected individual can change its 
state to another category, for example, coming back to sus- 
ceptible (SIS), or going to immunized or dead (SIR) fl^l2lll . 
This S ^ I approach (SI), in spite of its simplicity, is a good 
approximation at short times to more refined models such as 
the SIS or SIR models. The SI model on both SFN and RN is 
thus well adapted to the characterization of the variability of 
the initial stages of epidemic outbreaks spreading in complex 
networks, which is the focus of this article. 

The outline of the paper is the following. In section II, 
we study the fluctuations of the prevalence and we identify 
different parameters controlling them. In particular, we high- 
light the effects due to different realizations of the network 
as well as different initial conditions. We also investigate the 
influence of the nodes degree on the prevalence variability. 
In section III, we present results on the infection time and 
its variation with the degree and with the distance from the 
origin of infections. We also discuss the effect of the number 



of paths between two nodes on the infection time. Finally, we 
discuss our results and conclude in section IV. 



11. PREVALENCE FLUCTUATIONS 
A. Intra and inter-networks fluctuations 

We analyze in this section the effect of the underlying net- 
work topology on the variability of outbreaks. It is indeed 
important to understand whether the local fluctuations of the 
structure of the network can have a large impact on the devel- 
opment of epidemics. 

In order to analyze this effect, we measure the variability of 
outbreaks as the relative variation of the prevalence (density 
of infected individuals i{t)) given by 



In order to evaluate this quantity we run simulations for 
different "model sets": first, for a given number of outbreaks 
on a single network, second for a single outbreak on differ- 
ent networks, and finally several outbreaks on different net- 
works. We show in Fig.[2the curves Cy[z(t)] computed for 
both the RN (thin lines) and the SEN (bold fines) and for two 
of these model sets; 10"^ outbreaks spreading on the same net- 
work (dashed curves), and a single outbreak per network for 
lO'^ different networks (plain curves). The curves represent- 
ing these two model sets are nearly superimposed for both 
network topologies. The curves obtained from model sets 
made of 10 outbreaks on 100 networks and 100 outbreaks on 
10 networks coincide with the other model sets (not shown 
in the figure). These results indicate that the contribution to 
the variability of i{t) given by a particular network realization 
is essentially the same as the one generated by different out- 
breaks on the same network. This confirms the intuitive idea 
that sampling different parts of a large network is equivalent 
to average over different networks. Consequently, studying 
variability of epidemics simulated on one large enough net- 
work (intra-network) will lead practically to the same conclu- 
sions as studying variability on several instances of that net- 
work (inter-network). Furthermore, it means that the results 
described in the next sections for one network can be general- 
ized to any instances of BA and ER networks. 

Fig. [2 also reveals interesting facts about the time behavior 
of Cy on complex networks. Since the initial prevalence 
is fixed and is the same for all instances, CV is initially equal 
to zero and can only increase. At very large times, almost all 
nodes are infected implying that limt^oo CV = 0. This ar- 
gument implies the existence of a peak which — as shown in 
Fig-E — is located for BA networks at the beginning of the out- 
break, with a maximum value larger than the one obtained for 
ER networks. In order to characterize the relation between the 
variability peak and the network heterogeneity, we define r„ 
as the time at which the maximum of Cy[i(t)] is reached. 
We also use the fact that the heterogeneity of the network 
degree — often quantified by k ~ {k'^)/{k) — is related to the 
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Figure 1 : Evolution in time of the coefficient of variation of the den- 
sity of infected (CV[i{t)]) in BA networks (bold) and ER network 
(thin) for outbreaks simulated on the same network (dashed curve), 
or on different networks (plain curve).The results are obtained for 
A = 0.01 and on networks of size N = 10** nodes, and average de- 
gree (k) — 6. 

typical outbreak timescale t given by fl4lfr^ 

A discussion of the vaUdity of this equation is provided in 
Ref. jT6ll . In order to understand to which regime corresponds 
Tu, we plot in Fig.|2]T„ and r for BA networks with different 
values of k. We use networks with different sizes (from N = 
5.10"^ to = 5.10^ nodes) and with different values of (fc) 
(6 < (k) < 60) in order to obtain a broad range of t values. 

We see in Fig.|2]that Ty is increasing linearly with r (with a 
pre-factor of order 4). This implies that Ty is of the same order 
of the typical time tq where the diversity of degree classes of 
infected nodes is the largest (tq « 6 t) fl4lfl51 . The result 
Ty « To confirms the intuitive idea that the variability is max- 
imal when the diversity of different classes of infected nodes 
is the largest, which happens at the beginning of the spread. 

B. Effect of degree on i (t) fluctuations 

1. Seed degree 

In this SI model, the parameter A simply fixes the time unit. 
In contrast, we expect that other parameters such as the degree 
of the seed may have a more interesting effect on the outbreak 
and its variabiUty. Fig. |3 displays the evolution of CT^[i(t)] 
for outbreaks starting from initial infected nodes with a given 
degree fco (from 3 up to 248). This figure shows that the vari- 
ability peak decreases when k^ is increased. In other words, 
when an outbreak begins from a highly connected node, the 
early stages of the spreading tend to be less variable. One 
might think that the number of paths available on a highly con- 
nected node leads to a higher overall variability, it is however 
not the case. As shown in the inset of Fig.|3 the prevalence 
increases with the seed degree, which may explain the vari- 
ability for different fco. Indeed, when the seed is a hub, the 
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r=(fc>/[A*((fc^)-(fc))] 

Figure 2: versus r for several BA networks with (fc) ranging from 
6 to 60, and different sizes (•: = 5.10'', A: iV = lO", T: iV = 
2.10*, ♦: = 5.10* nodes; A = 0.01). The line is a linear fit with 
slope of order 4. 
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Figure 3: Temporal evolution of the coefficient of variation of the 
density of infected (CV^[j(t)]) in BA networks for outbreaks seeded 
with infected nodes of different degrees fco (from top to bottom, 
fco = 3, 6, 12, 24, 48, 95, 142, 248). Inset: Initial evolution of the 
prevalence i(t). The order of the curves is reversed between both 
plots (Results are averaged over 5.10'^ epidemics on one network, 
withA^O.Ol, iV = 10* nodes, (fc> = 6). 



number of infected becomes rapidly very large and thus leads 
to smaller relative variations of the prevalence. This result 
leads us to investigate more thoroughly the degree of infected 
nodes and analyze the differences between BA and ER net- 
works. 



2. Degree of infected nodes 

In this section, we study in detail the degree properties of 
the infected nodes during the outbreak of the disease. 

For a SI model, the evolution of the density ik {t) of infected 
nodes of degree k is given at the mean-field level by 
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dt 
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Figure 4: Temporal evolution of the density of infection by classes 
of degree, (a) Spreading on a BA network. The dashed lines are 
given by Eq. jsj with different values for i: 0.002, 0.02 (lower to 
upper) and the plain lines corresponds to their numerical results, (b) 
Spreading on an ER network. For both panels the color bar represents 
the density of infected and where white means O.I and above (The 
results are computed over 10"^ outbreaks on networks of size A*' = 10* 
nodes, (fc) =6, and spreading rate A = 0.01). 



where l—ik is the density of susceptible nodes of degree k and 
0fc is the probability that a link pointing to a node of degree k 
originates at an infected node llOll . This equation, studied for 
an uncorrected scale-free network and uniform initial condi- 
tions i n = ik {t = 0) leads to the following behavior at short 
times 111 [11 
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with T defined in Eq. 0. 

From this equation, we can deduce the expression for the 
time tk{i) for ik to reach the value i: 



tk{i) - Tlog 



1 
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(5) 



For a fixed prevalence i, the time tk («) varies very slowly with 
k and thus can vary significantly only on a network with a 



4 




20 40 60 80 100 120 140 

k 



Figure 5: Temporal evolution of the coefficient of variation of the 
density of infected by classes of degree on a BA networlc. The range 
of CV[ik it)] is limited to [0, 5] for readability (the actual results can 
go up to 30). Color bar accounts for CV[ik{i)\ (white means 10 
and above). These results are obtained for — 10'' nodes, (fc) — 6, 
A = 0.01, and averaged over 10'^ outbreaks on 50 different networks 
in order to have data for the whole range of degrees. 



large range of degree variation. The results are shown on 
Fig. 0] which is composed of two contour maps of the tem- 
poral evolution of ik in both BA and ER networks. In order 
to simplify the reading of this figure, the density of infection 
has been limited to 0.1 since we are only interested in the be- 
ginning of the outbreaks. We also plot, in Fig.|^a), the curves 
corresponding to Eq. for different values of i (0.002 and 
0.02) and numerical result for the same values (plain curves). 
It can be seen that the predictions of Eq. Q for small den- 
sity and short times are in agreement with the average behav- 
ior obtained from our simulations (the agreement is better for 
larger degree since the hubs are infected at smaller times). For 
larger times, the approximation used in Eq. is not valid 
anymore, which explains the observed discrepancy for larger 
values of the den sity such as i = 0.02. These results confirm 
earlier work I14lfl5ll on the "cascading effect" of the spread- 
ing, from hubs to poorly connected nodes. Figure|4lb) is the 
ER counterpart of Fig.|31a). It demonstrates that the hierarchi- 
cal spreading from well connected to poorly connected nodes 
also occurs on homogeneous networks. The cascading effect 
however is less visible on the average degree of infected nodes 
because of the limited range of degrees (see also Sec. lIII AV 

Figure|5]gives a complete picture of the variability of ik{t) 
in an heterogeneous network and helps to understand the role 
of each degree in the variability peak observed in Fig. It 
displays for a BA network a contour map representation of 
the temporal evolution of CV[i{t)] according to the classes of 
degree. We observe that the largest values of CV[ik{t)] are 
reached at the beginning of outbreaks, then decrease during 
the infection process. The very high values of CV (white on 
the plot) which can be up to 30 are reached during a period 
lasting until 6 r (in this plot r w 7). The end of this pe- 
riod corresponds to the moment when all degree classes are 



infected. For superspreaders. Fig. |5] also shows that their in- 
fection time is fluctuating a lot even for long times, because of 
their small number in networks. This result will be confirmed 
in the next section and means that their infection time has im- 
portant fluctuations. For some outbreaks, the time to reach a 
superspreader can be long because of its distance to the seed 
(see Sec HITEt . 



III. FLUCTUATIONS OF THE INFECTION TIME 

The randomness of the epidemic process makes it very dif- 
ficult to predict an accurate time interval for the infection of 
a given node. However, with the same methods used in the 
previous section, we can draw the general picture of the dis- 
tributions of the infection time tinf — defined as the time for 
which a given node becomes infected — as a function of the 
degree of the node and its distance to the seed (similar consid- 
erations were studied in ITtIi ). 



A. Effect of the degree 

Fig-Sshows how the prevalence i{t) varies with the degree. 
Time of infection and prevalence being related, we first plot 
(Fig. |6} the distribution of the infection time tinf versus the 
degree. For this figure we count all the nodes with a given de- 
gree k which have been infected at each instant t, and then we 
normalize the corresponding results by the number of individ- 
uals with degree k and by the number of simulations. Each 
degree is represented by a column where frequencies are as- 
sociated with a representative color (right color bar), the sum 
of all frequencies in a column being equal to one. Given that 
a single BA network does not contain the whole range of de- 
grees, the plot shown on Fig. |6ja) is based on data from 50 
networks. These results are a consequence of the cascading 
effect on lower degree nodes on both topologies: the larger 
the degree and the smaller the average infection time. In ad- 
dition, we observe that there is a relatively large range of fluc- 
tuation of the infection time even for large degrees. Indeed, 
in the inset of Fig.l^Ja) we observe that for highly connected 
nodes (e.g. from 80 to 150), the typical tinj varies between 
6 T and 13 r (on the plot, i = 40 and 90) which is late for 
well-connected nodes. In fact, only a small fraction of the su- 
perspreaders is infected during the early epidemic stages (un- 
til 6 r) and triggers the outbreak. Approximately the same 
scenario seems to hold for ER networks (Fig. |6jb)), even if 
the concept of superspreaders is not the most appropriate for 
a network with a small range of degree variation. 

In order to understand thoroughly the properties of the in- 
fection time, we also show in Fig. 0scatter-plots of its rela- 
tive dispersion CV{tinf) versus the degree for both ER and 
BA topologies. This figure displays more insights concern- 
ing the behavior of tinj depicted in Fig. |6l First, for the 
BA network, nodes with a given degree k can have a wide 
range of CV{tinf) which increases with k. This demonstrates 
that even if the superspreaders are infected at relatively short 
times, large relative fluctuations cannot be excluded. In con- 
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Figure 6: Frequency of the moment of infection of a node as a func- 
tion of its degree, (a) spreading on a BA network. Inset: frequency of 
tinf sliown for tlie beginning of outbreaks on high degree nodes, (b) 
spreading on an ER network. For both panels the color bar represents 
the frequency and where white means 0.02 and above. (A^ = 10* 
nodes, (fc) =6, A = 0.01, t^^ ~ 7, t^^ ^ 16.5). 



trast, all nodes for the ER network have smaller and similar 
values of CV{tinf) which is consistent with the fact that the 
hierarchical spreading is less pronounced on ER due to its lim- 
ited range of degree. 
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Figure 7: Coefficient of variation of infection time as a function of 
nodes degree k. Gray symbols stand for CV{tinf) computed by 
nodes, and black symbols for C'V{tinf) computed for nodes with 
the same degree (vertically aligned). A symbols represent the spread 
on BA networks, and • stand for ER networks (Results are computed 
over 10'^ outbreaks on a single network , A'' = 10* nodes, (fe) = 6 
links, with a seed of degree ko = 6, A = 0.01). 
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Figure 8: Infection time as a function of the distance £ from the seed 
{N = 10*, (fc) = 6, averaged over 10'' outbreaks which start at 
exactly the same seed of degree fco ~ 6). 



B. Effect of distance 

Another important parameter which affects the infection 
time of a node is its distance to the seed as measured by the 
number of hops of the shortest path fvj^ . In the networks con- 
sidered here there is no spatial component and the distance 
between two nodes is given by the smallest number of hops £ 
to go from one node to another. On Fig. |S] we show the rela- 
tionship between the average time of infection and £ for 
ER and BA networks. We see on this plot that the infection 
time (tinf) is always larger for ER than for BA networks. It 
means that nodes with the same value of £, i.e. at the same 
distance from the first infected node, have a lower if 
they belong to a BA networks. The reason for this behavior 
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Table I: Average number of shortest paths between a randomly se- 
lected node and a node at distance £ (results are computed over 10^ 
random selections of an initial node of degree ko =6). = 10* 
nodes, (fc) — 6. 
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lies in the difference of the numbers of shortest paths in these 
networks. Indeed, if we enumerate these paths, we observe 
that their numbers relatively differ between both BA and ER 
topologies. We have computed the size and the number of 
shortest paths between a randomly selected node, i.e. a po- 
tential seed of infection and the rest of the network and we 
present in Table U the average number of shortest path at dis- 
tance t. Results are computed over lO'' random selection of 
the potential seed in order to get an accurate picture of the 
network. The table exhibits a difference in the number of path 
for £ > 2 (8% difference for ^ = 3, 40% for ^ = 4, 74% for 
^ = 5) which confirms the fact that on BA networks, nodes 
have more paths to go from one to another in a small number 
of hops. Table m describes the statistics of shortest paths but 




Figure 9: Average infection time of node B as a function of A for 
two different configurations. In the first case infection occurs in one 
step and in the second case another path is added. The dotted curve 
represents the average time of infection for the first case, {ti„f) = 
1/A and the plain curve represents {tinj) for the second case and is 
given by Eq. 0. The result of a numerical simulation are shown by 
+ symbols). 



longer paths also contributes to the spreading of the disease. 
Their role can be highlighted by studying the following simple 
cases. In the first case an infected node A is in contact with a 
susceptible node B. In the second case, there is an additional 
path from A lo B going through a susceptible node C (see 
Fig.|5J. In the first "direct" case, the average time of infection 
ittf)iB) of Bis given by 



{ttfiB)) = j. 



(6) 



The addition of a longer path in the second case (Fig. |5} 
changes the behavior of {Unf (B)) and Eq. (|6j no longer holds 
for this case. In fact, the time of infection of the susceptible 
node B is given by 



t 



inf 



{B)^rmn[ttf{B),tl„jiB)], 



(7) 



where tf^AB) is the time of a direct infection A ^ B and 
of an indirect 2-steps infection process: A ^ C ^ B. 
The statistics of tinf can be easily computed and its first mo- 
ment reads 
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Figure 10: Coefficient of variation of the infection time as a func- 
tion of the distance £ from the seed. Top panel: spreading on a BA 
network; bottom panel: ER network. Both panels show CV{ti„f), 
for every nodes of a single network, A'' — lO"* nodes, (fc) = 6, and 
computed over 10^ outbreaks, A = 0.01, originating from exactly 
the same seed of degree ko — 6. 



Eq. (|8ji predicts values always smaller than 1/A (see Fig.|9jl. 
This result could appear as paradoxical since adding a longer 
path actually reduces the average infection time. In fact, the 
probability that the disease is not transmitted on both paths is 
very small and the existence of another path cuts off large di- 
rect infection time and thus reduces the average infection time 
of B. Since BA networks have a clustering coefficient larger 
than ER networks (l| this result explains the small difference 
of infection times for £ = 1 seen in Fig.|8] 

Concerning the relationships between the relative disper- 
sion of infection time CV{tinf) and £, their behavior on both 
topologies are reported on Fig. ^01 This figure shows that 
the nodes in both networks exhibit higher values of CV{tinf) 
when they are closer to the seed, i.e. for £ < 3. For larger 
distances, CV{ti„f) is practically constant in both cases. 



IV. CONCLUSIONS 

We have analyzed in detail the variability of a simple epi- 
demic process on SEN. First, we have shown that different 
realizations of BA networks do not display significant statis- 
tical differences in outbreak variability. Consequently, it is 
statistically reliable to consider a single realization of the net- 
work, provided it is large enough. We have also shown that the 
prevalence fluctuations are maximal during the time regime 
for which the diversity of the degrees of the infected node is 
the largest. In order to analyze in detail this variability, we ex- 
amined the temporal degree pattern of infected nodes. In par- 
ticular, we demonstrated the high variability of superspread- 
ers' prevalence. We found that for the hubs the infection time 
is usually small but with fluctuations which can be large. Even 
if the hubs are good candidates for being chosen as surveil- 
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lance stations — given their short average infection time, they 
present non-negligible fluctuations which limit their reliabil- 
ity. In this respect, the ideal detection stations should be nodes 
with the best trade-off between a short average infection time 
and a high reliability as given by small infection time fluctua- 
tions. 

The topological distance to the seed is also an important 
parameter in epidemic spreading pattern. Nodes at a short 
distance from the seed are infected at small time — in the high 
variability regime — and thus have a large infection time vari- 
ability. Maybe more surprising is the importance of the num- 
ber of paths — not only the shortest one — going from the seed 
to another node. The larger this number and the smaller the 
average infection time. This is an important conclusion for 
containment strategies since the reduction of epidemic chan- 
nels will increase the delay of the infection arrival and will 
thus allow for a better preparation against the disease (for ex- 
ample vaccination). 



These results could be helpful in designing early detection 
and containment strategies in more involved models which go 
beyond topology and which include additional features such 
as p assenger traffic in airlines or city populations 
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